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30 Abstract 

31 Coronavirus spike proteins from different genera are divergent, although they all 

32 mediate coronavirus entry into cells by binding to host receptors and fusing viral and cell 

33 membranes. Here we determined the cryo-EM structure of porcine delta coronavirus 

34 (PdCoV) spike protein at 3.3-angstrom resolution. The trimeric protein contains three 

35 receptor-binding SI subunits that tightly pack into a crown-like structure and three 

36 membrane-fusion S2 subunits that form a stalk. Each SI subunit contains two domains, 

37 N-terminal domain (Sl-NTD) and C-terminal domain (Sl-CTD). PdCoV Sl-NTD has 

38 the same structural fold as alpha- and beta-coronavirus S1 -NTDs as well as host 

39 galectins, and it recognizes sugar as its potential receptor. PdCoV Sl-CTD has the same 

40 structural fold as alpha-coronavirus Sl-CTDs, but its structure differs from that of beta- 

41 coronavirus Sl-CTDs. PdCoV Sl-CTD binds to an unidentified receptor on host cell 

42 surfaces. PdCoV S2 is locked in the pre-fusion conformation by structural restraint of SI 

43 from a different monomeric subunit. PdCoV spike possesses several structural features 

44 that may facilitate immune evasion by the virus, such as its compact structure, concealed 

45 receptor-binding sites, and shielded critical epitopes. Overall, this study reveals that 

46 delta-coronavirus spikes are structurally and evolutionally more closely related to alpha- 

47 coronavirus spikes than to beta-coronavirus spikes; it also has implications for the 

48 receptor recognition, membrane fusion, and immune evasion by delta-coronaviruses as 

49 well as coronaviruses in general. 
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52 

53 Significance 

54 In this study we determined the cryo-EM structure of porcine delta coronavirus 

55 (PdCoV) spike protein at 3.3 angstrom. This is the first atomic structure of a spike protein 

56 from the delta coronavirus genus, which is divergent in amino acid sequences from the 

57 well-studied alpha- and beta-coronavirus spike proteins. In the current study, we 

58 described the overall structure of the PdCoV spike and the detailed structure of each of its 

59 structural elements. Moreover, we analyzed the functions of each of the structural 

60 elements. Based on the structures and functions of these structural elements, we discussed 

61 the evolution of PdCoV spike protein in relation to the spike proteins from other 

62 coronavirus genera. This study combines the structure, function, and evolution of 

63 coronavirus spike proteins, and provides many insights into the receptor recognition, 

64 membrane fusion, immune evasion, and evolution of PdCoV spike protein. 

65 
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66 Introduction 

67 Coronaviruses are large enveloped RNA viruses that can be classified into four 

68 genera: a, P, y, and 5 (1). Both a- and P-coronaviruses infect mammals, y-coronaviruses 

69 infect birds, and 5-coronaviruses infect mammals and birds (1). Representative 

70 coronaviruses include: human NL63 coronavirus (HCoV-NL63) and porcine 

71 transmissible gastroenteritis coronavirus (TGEV) from a genus; mouse hepatitis 

72 coronavirus (MHV), bovine coronavirus (BCoV), SARS coronavirus (SARS-CoV) and 

73 MERS coronavirus (MERS-CoV) from P genus; avian infectious bronchitis virus (IBV) 

74 from y genus; porcine delta coronavirus (PdCoV) from 5 genus (2). Coronaviruses from 

75 different genera demonstrate distinct serotypes, mainly due to the divergence of their 

76 envelope-anchored spike proteins (3). The spike proteins mediate viral entry into host 

77 cells by first binding to host receptors through their S1 subunit and then fusing host and 

78 viral membranes through their S2 subunit (4). Hence they are critical determinants of 

79 viral host range and tissue tropism, and also induce most of the host immune responses 

80 (5). Knowing the structure and function of the spike proteins from different genera is 

81 critical for understanding cell entry, pathogenesis, evolution, and immunogenicity of 

82 coronaviruses (6). 

83 The receptor recognition pattern by coronaviruses is complicated (7). The SI 

84 subunits from a- and P-coronavirus spikes contain two domains, the N-terminal domain 

85 (Sl-NTD) and C-terminal domain (Sl-CTD). Depending on the virus, either one or both 

86 of the SI domains can function as the receptor-binding domain (RBD) by binding to host 

87 receptors. On the one hand. Sl-CTDs from a- and P-coronaviruses have different tertiary 
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88 structures, but they share a common structural topology, indicating a common 

89 evolutionary origin and subsequent divergent evolution of Sl-CTDs (7). a-coronavirus 

90 Sl-CTDs recognize either angiotensin-converting enzyme 2 (ACE2) or aminopeptidase- 

91 N (APN) as their protein receptor, whereas P-coronavirus Sl-CTDs recognize either 

92 ACE2 or dipeptidyl peptidase 4 (DPP4) (8-16). Hence Sl-CTDs likely have undergone 

93 further divergent evolution to recognize different receptors. On the other hand, Sl-NTDs 

94 from a- and P-coronaviruses both have the same structural fold as human galectins, and 

95 they recognize either sugar receptors or a protein receptor CEACAM1 (17-23). Hence it 

96 has been suggested that coronavirus Sl-NTDs originated from host galectins and have 

97 undergone divergent evolution to recognize different receptors (7). These studies on 

98 receptor recognition by coronaviruses have revealed complex evolutionary relationships 

99 among the spikes from different genera. 

100 The membrane fusion mechanism for coronavirus spikes is believed to be similar 

101 to those used by “class 1” viral membrane-fusion proteins (24, 25). The best studied such 

102 protein is hemagglutinin (HA) from influenza virus (26, 27). Influenza HA exists in two 

103 structurally distinct conformations. Its “pre-fusion” conformation on mature virions is a 

104 trimer, already cleaved by host proteases into receptor-binding subunit HA 1 and 

105 membrane fusion subunit HA2 that remain associated. During the membrane fusion 

106 process, HA1 dissociates and HA2 undergoes a dramatic conformational change to reach 

107 its “post-fusion” confonnation: two heptad repeat (HR) regions from each HA2 subunit, 

108 HR-N and HR-C, refold into a six-helix bundle, and a previously buried hydrophobic 

109 fusion peptide (FP) becomes exposed and inserts into host membrane. The cryo-EM 

110 structures of a- and P-coronavirus spikes in the pre-fusion conformation have recently 
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111 been determined (28-31). The overall architecture of a- and P-coronavirus spikes is 

112 similar to, albeit more complex than, that of influenza HA. Biochemical studies have 

113 identified parts of S2 that form six-helix bundle structures and hence likely correspond to 

114 HR-N and HR-C respectively (32-34), and another part of S2 that associates with 

115 membranes and hence likely corresponds to FP (35, 36). It was demonstrated that ex- 

116 coronavirus spikes are heavily glycosylated, with S2 more heavily glycosylated than SI, 

117 as a viral strategy for immune evasion (29). These studies on membrane fusion by a- and 

118 P-coronavirus spikes have suggested a common molecular mechanism for membrane 

119 fusion shared by coronavirus spikes and other class 1 viral membrane fusion proteins (37, 

120 38). 

121 PdCoV from the 5 genus is a highly lethal viral pathogen in piglets (39-41). 

122 Compared to the extensive studies on a- and P-coronavirus spikes, much less is known 

123 about the structure and function of S-coronavirus spikes. It is not clear which of their S1 

124 domains functions as the RBD, where the structural elements of S2 are located, how 5- 

125 coronavirus spikes are structurally and evolutionarily related to the spikes from other 

126 genera, or what strategies 5-coronavirus spikes use to evade host immune surveillance. 

127 This study fills in these critical gaps by determining the cryo-EM structure of PdCoV 

128 spike and revealing its functions in receptor binding, viral entry and immune evasion. 

129 Results and Discussion 

130 Overall structure of PdCoV spike 
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131 To capture PdCoV spike in the pre-fusion conformation, we constructed and 

132 prepared PdCoV spike ectodomain (S-e) without the transmembrane anchor or 

133 intracellular tail (Fig. 1 A). We also excluded a short pre-transmembrane region (PTR) 

134 because this region is hydrophobic and can adversely affect protein solubility (42). 

135 Instead, we replaced these regions with a GCN4 trimerization tag followed by His6 tag. 

136 We expressed PdCoV S-e in insect cells, and purified it to homogeneity. We collected 

137 cryo-EM data on PdCoV S-e, and determined its structure at 3.3A resolution (Table 1; 

138 Fig. IB, Fig. 2). 

139 The atomic structure of pre-fusion PdCoV S-e contains residues from 52 to 1017, 

140 covering all of the key structural elements except HR-C (Fig. 1A). The overall trimeric 

141 structure of PdCoV spike is similar to, but more compact than, those of a- and P- 

142 coronavirus spikes: PdCoV spike has a length of 130A from SI to S2 and a width of 50A 

143 at S2 (Fig. 1C). S2 itself spans 100A in length (Fig. ID). Three SI subunits form a 

144 crown-like structure and sit on top of the trimeric S2 stalk (Fig. 1C, ID). Three Sl-CTDs 

145 are located at the top and center of the spike trimer, whereas three Sl-NTDs are located 

146 on the lower and outer side of Sl-CTDs (Fig. 3A, 3B, 3C, 3D). The Sl-CTD mainly 

147 stacks with the Sl-NTD from the same monomeric subunit, although there also exist 

148 inter-subunit interactions between Sl-CTDs from different subunits and between Sl- 

149 CTD and Sl-NTD from different subunits. In contrast, the SI trimer of P-genus MHV 

150 spike has an intertwined quaternary structure, with Sl-CTD from one subunit mainly 

151 stacking with Sl-NTD from another subunit (Fig. 4A) (30). Like PdCoV spike, the Sl- 

152 CTD in a-genus HCoV-NL63 spike also mainly stacks with the Sl-NTD from the same 

153 subunit (Fig. 4B) (29). Moreover, whereas each subunit of PdCoV SI contains only one 
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154 Sl-NTD, each subunit of HCoV-NL63 SI contains two, possibly resulting from gene 

155 duplication (Fig. 4B) (29). Connecting SI and S2 are two subdomains, SD1 and SD2, and 

156 a long loop (Fig. 3A, 3B). The structure of PdCoV S2 is in the pre-fusion conformation 

157 and can be aligned well with those of a- and P-coronavirus S2 fragments (Fig. 4A, 4B). 

158 HR-C is missing in both the current PdCoV S2 structure and previously published a- and 

159 P-coronavirus S2 structures, suggesting that this region is poorly ordered. Our structural 

160 model also includes glycans N-linked to 39 residues on the trimer (13 on each monomeric 

161 subunit). In this article, we will illustrate the structures and functions of each of the 

162 structural elements in PdCoV spike. 

163 Structure, function, and evolution of PdCoV Sl-NTD 

164 PdCoV Sl-NTD adopts a P-sandwich fold identical to human galectins (Fig. 5A). 

165 Its core structure consists of two anti-parallel P-sheet layers: one is seven-stranded and 

166 the other is six-stranded. On top of the core structure is a short a-helix. Underneath the 

167 core structure is another three-stranded P-sheet and another a-helix. The Sl-NTDs from 

168 a- and P-coronaviruses have the same galectin fold (Fig. 5B, 5C). Like PdCoV Sl-NTD, 

169 a-coronavirus Sl-NTDs contain a short a-helix on top of the core structure, but P- 

170 coronavirus Sl-NTDs contain a ceiling-like structure in the same location. The galectin 

171 fold of PdCoV Sl-NTD suggests that like some of the a- and P-coronavirus Sl-NTDs, 

172 PdCoV Sl-NTD may recognize sugar as host receptors to facilitate initial viral 

173 attachment to cells, and hence it may function as a viral lectin. 
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174 We investigated the sugar-binding capability of PdCoV Sl-NTD. To this end, we 

175 expressed and purified recombinant PdCoV Sl-NTD containing a C-terminal His ft tag, 

176 and carried out an ELISA assay to examine whether it binds sugar (Fig. 5D). More 

177 specifically, PdCoV Sl-NTD was incubated with mucin, which contains a variety of 

178 sugar chains on its surface; subsequently, the mucin-bound PdCoV Sl-NTD was detected 

179 using antibodies recognizing its His6 tag. The result showed that PdCoV Sl-NTD bound 

180 to mucin. Thus, PdCoV Sl-NTD bound to the sugar moiety of mucin and can potentially 

181 recognize sugar as its receptor. The sugar-binding site in PdCoV Sl-NTD is currently 

182 unknown. Because the sugar-binding site in [3-genus BCoV Sl-NTD and the galactose- 

183 binding site in human galectins are both located on top of the core structure (18, 43), the 

184 sugar-binding site in PdCoV Sl-NTD may also be located in the same region (Fig. 5A, 

185 5C). 

186 The above structural and functional analyses of PdCoV Sl-NTD provide insight 

187 into the evolution of coronavirus Sl-NTDs from different genera. Previously, based on 

188 the structures and functions of [3-coronavirus Sl-NTDs, we hypothesized that ancestral 

189 coronaviruses acquired a galectin gene from the host and incorporated it into their spike 

190 gene, which began to encode Sl-NTD; we further predicted that the Sl-NTDs from other 

191 genera also contain the galectin fold. Both the structure of PdCoV Sl-NTD presented 

192 here and the structures of a-coronavirus Sl-NTDs determined by recent studies 

193 confirmed our earlier prediction and lent further support to our previous hypothesis. 

194 Hence, coronavirus Sl-NTDs from different genera likely all have the same evolutionary 

195 origin, which might be the host galectin, and have conserved the galectin fold through 

196 evolution. 
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197 Structure, function, and evolution of PdCoV Sl-CTD 

198 PdCoV Sl-CTD adopts a P-sandwich fold also containing two P-sheet layers: one 

199 is a three-stranded anti-parallel P-sheet and the other is a three-stranded mixed P-sheet 

200 (Fig. 6A). Its structure is similar to the P-sandwich core structure of a-coronavirus Sl- 

201 CTDs, but different from the core structure of P-coronavirus Sl-CTDs that contains a 

202 single P-sheet layer (Fig. 6B, 6C). We previously showed that despite their different 

203 structural folds, a- and P-coronavirus Sl-CTDs share the same structural topology (i.e., 

204 connectivity of secondary structural elements) (7). Similarly, PdCoV Sl-CTD also shares 

205 the same structural topology with P-coronavirus Sl-CTDs. Because a- and P- 

206 coronaviruses widely use their Sl-CTD as the main RBD by recognizing protein 

207 receptors, PdCoV Sl-CTD may also recognize a protein receptor and function as the 

208 main RBD. 

209 We examined the possibility of PdCoV Sl-CTD recognizing a receptor on the 

210 surface of mammalian cells. To this end, we expressed and purified recombinant PdCoV 

211 Sl-CTD containing a C-terminal Fc tag, and performed a flow cytometry assay to detect 

212 the binding of PdCoV Sl-CTD-Fc to mammalian cells (Fig. 6D). Here the cell-bound 

213 PdCoV Sl-CTD was detected using antibodies recognizing its Fc tag. The result showed 

214 that PdCoV S 1-CTD-Fc bound to both human and pig cells with significantly higher 

215 affinity than Fc alone, suggesting that PdCoV Sl-CTD binds to a receptor on the surface 

216 of both human and pig cells. Although PdCoV Sl-CTD demonstrates higher affinity for 

217 human cells than for pig cells, it is unknown whether PdCoV infects human cells since 

218 receptor recognition is only one of several factors that can impact coronavirus infections. 
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219 We further investigated whether PdCoV Sl-CTD recognizes ACE2 or APN, two known 

220 protein receptors for a-coronavirus Sl-CTDs. To this end, we prepared and purified 

221 recombinant PdCoV Sl-CTD containing a C-terminal His6 tag, and carried out a dot-blot 

222 assay to examine whether it binds ACE2 or APN (Fig. 6E). The result showed that 

223 PdCoV Sl-CTD does not bind ACE2 or APN. As positive controls, TGEV Sl-CTD 

224 binds APN, whereas SARS-CoV Sl-CTD binds ACE2. Taken together, these results 

225 demonstrate that PdCoV Sl-CTD likely functions as the main RBD and binds a yet-to- 

226 be-identified receptor on the surface of human and pig cells. 

227 The receptor-binding site in PdCoV Sl-CTD is currently unknown. In a- 

228 coronavirus Sl-CTDs, the three loops on the top of the (3-sandwich core function as 

229 receptor-binding motifs (RBMs) by binding to their respective protein receptor, ACE2 for 

230 HCoV-NL63 and APN for TGEV. In PdCoV S1 -CTD, the same three loops are 

231 structurally similar to their counterparts in a-coronavirus Sl-CTDs. Hence, these three 

232 loops in PdCoV Sl-CTD may bind to a protein receptor and function as RBMs. In the 

233 current structure, the Sl-CTD is in a closed conformation, with its putative RBMs 

234 pointing towards the S1-NTD and unavailable for receptor binding. To bind its receptor, 

235 the Sl-CTD would need to switch to an open conformation by “standing up” on the spike 

236 trimer and rendering the putative RBMs available for receptor binding. 

237 Based on the above structural and functional analyses, we discuss the evolution of 

238 coronavirus Sl-CTDs. Because Sl-CTD is located on the tip of the pre-fusion spike 

239 trimer, it is the most exposed region on the surface of virions and thereby is under heavy 

240 immune pressure to evolve. Possibly as a consequence of immune pressure, Sl-CTD is 
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241 structurally divergent among different coronavirus genera: a- and 5-coronavirus Sl- 

242 CTDs have a P-sandwich core, whereas p-coronavirus Sl-CTDs have a P-sheet core. The 

243 RBMs are located on the very tip of Sl-CTDs, and are even more structurally divergent 

244 than the core structure of Sl-CTDs. The RBMs in a- and 5-coronavirus Sl-CTDs are 

245 three short discontinuous loops; depending on the virus, their RBM loops can bind APN 

246 (as in TGEV), ACE2 (as in HCoV-NL63), or a yet-to-be-identified receptor (as in 

247 PdCoV). The RBM in P-coronavirus Sl-CTDs is a long continuous subdomain; 

248 depending on the vims, their RBM can bind ACE2 (as in SARS-CoV) or DPP4 (as in 

249 MERS-CoV). Despite their structural divergence, the Sl-CTDs from different genera 

250 share the same structural topology in their cores (7). These results suggest that these Sl- 

251 CTDs have a common evolutionary origin and have undergone divergent evolution. 

252 Moreover, our study demonstrates that PdCoV Sl-CTD is structurally and evolutionarily 

253 more closely related to a-coronavirus Sl-CTDs than to P-coronavirus Sl-CTDs. 

254 Structures, functions, and evolution of SI subdomains 

255 The structures of SD1 and SD2 are similar to their counterparts in a- and P- 

256 coronavirus spikes (Fig. 3B). SD1 adopts a small P-sandwich fold containing two 

257 antiparallel P-sheets: one is two-stranded and the other is five-stranded. SD2 also adopts 

258 a small P-sandwich fold containing two three-stranded P-sheets: one is antiparallel and 

259 the other is mixed. Interestingly, both SD1 and SD2 consist of discontinuous regions: 

260 majority of their sequences are to the C-terminus of Sl-CTD, but they also each contain a 

261 region to the N-terminus of Sl-CTD. Based on these structural data, SD1 and SD2 might 

262 have evolved later than Sl-NTD and Sl-CTD. The main function of the two SI 
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263 subdomains is to connect SI and S2, but SD1 also plays a role in membrane fusion as 

264 discussed below. 

265 Structure, function, and evolution of S2 

266 The overall structure of the pre-fusion trimeric PdCoV S2 is similar to those of a- 

267 and (i-coronaviruses. Two central helices, CH-N and CH-C, from each subunit form a 

268 six-helix inter-subunit interface. Based on previous biochemical and structural studies 

269 using isolated regions in S2, HR-N corresponds to a region consisting of four helices and 

270 connecting loops, and HR-C corresponds to a disordered region (Fig. 7A, 7B) (30). The 

271 exact location of FP is uncertain, but it may correspond to a region consisting of two 

272 helices and a connecting loop (30). Examination of the pre-fusion and post-fusion 

273 structures of influenza HA2 suggests that during the conformational changes of PdCoV 

274 S2, HR-N from each subunit in the pre-fusion conformation would need to fold into one 

275 long central helix as part of the six-helix bundle of the post-fusion structure (Fig. 7C). 

276 Hence, like influenza HA2, part of the CH-C in PdCoV S2 should also be part of the HR- 

277 N, such that the other parts of HR-N can anchor upon CH-C and extend towards the 

278 membrane-distal direction (Fig. 7A). Like the FP in influenza HA2, the FP in PdCoV S2 

279 would also need to change its conformation, spring out towards the membrane-distal 

280 direction, and insert into the target membrane. The reason why HR-N and FP are locked 

281 in their pre-fusion conformation is likely because Sl-CTD and SD1 from another subunit 

282 sit on top of them respectively, and prevent them respectively from extending towards the 

283 membrane-distal direction. The stacking between SI and S2 from two different subunits 

284 contributes to the compact structure of PdCoV spike trimer. Two protease cleavages, one 
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285 at the S1/S2 boundary and the other on the N-terminus of FP, can potentially remove the 

286 structural restraint of SI on S2, allowing the conformational changes of S2 to occur (30, 

287 37, 44). Both the structural and mechanistic similarities between coronavirus S2 and 

288 influenza HA2 suggest that the two viral membrane-fusion proteins are evolutionary 

289 related (4). The above analysis will need to be confirmed by the atomic structure of post- 

290 fusion PdCoV S2. 

291 Immune evasion strategies by PdCoV spike 

292 The structure of PdCoV spike suggests immune evasion strategies by PdCoV 

293 spike. First, the PdCoV spike has a compact structure. The six domains and six 

294 subdomains of trimeric S1 are tightly packed (Fig. 3B, 3C), which reduces the surface 

295 area of the spike protein. Despite its compact structure, S1 maintains the two-RBD 

296 system, giving the virus more options in receptor selections than a single-RBD system 

297 would do. Second, in the current structure, PdCoV Sl-CTD is in a closed conformation 

298 with its putative RBM loops facing Sl-NTD and inaccessible to the host receptor (Fig. 

299 3D). Upon infecting host cells, Sl-CTD would need to switch to an open conformation to 

300 render the putative RBM loops accessible to the host receptor. The closed-to-open 

301 conformational change of Sl-CTD has been observed for P-genus MERS-CoV and 

302 SARS-CoV spikes (28). This mechanism can minimize the exposure of the putative RBM 

303 loops to the immune system. Third, our structural model of PdCoV spike contains 

304 glycans N-linked to 39 residues (13 on each subunit); there are also another 24 predicted, 

305 but not observed, N-linked glycosylation sites (8 on each subunit) (Fig. 8A, 8B). Most of 

306 these sites are located on the surface of SI, which is in contrast to a-genus HCoV-NL63 
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307 spike where S2 is more heavily glycosylated than SI. Thus, while it was previously 

308 suggested that HCoV-NL63 spike evades host immune surveillance mainly by glycan 

309 shielding its S2 epitopes (29), PdCoV spike appears to evade host immune surveillance 

310 mainly by glycan shielding its SI epitopes. For example, the putative sugar-binding site 

311 in PdCoV Sl-NTD is surrounded by glycans, which reduces the accessibility of this site 

312 to the immune system (Fig. 8C). As a comparison, the sugar-binding site in [1-genus 

313 BCoV Sl-NTD is also shielded, not by glycans, but by the ceiling-like structure on top of 

314 the core structure (18). Taken together, PdCoV spike has several structural features that 

315 may facilitate viral immune evasion, such as reducing surface areas, concealing receptor- 

316 binding sites, and shielding critical SI epitopes. 

317 Conclusions 

318 In this study we determined the cryo-EM structure of PdCoV spike at 3.3 A. To 

319 our knowledge, this is the first atomic structure of a spike protein from the 5 coronavirus 

320 genus, which is divergent in amino acid sequences from the well-studied a- and P- 

321 coronavirus spikes. Our study reveals a compact PdCoV spike trimer locked in the pre- 

322 fusion conformation. The trimeric SI contains six domains (three copies of Sl-NTD and 

323 Sl-CTD each) and six subdomains (three copies of SD1 and SD2 each) that tightly pack 

324 into a crown-like structure. PdCoV Sl-NTD has the same galectin fold as a- and P- 

325 coronavirus Sl-NTDs; it binds sugar and can potentially recognize sugar as its receptors. 

326 These results expand our knowledge on the structures and functions of Sl-NTDs from 

327 different coronavirus genera, and provide further evidence on the common host origin of 

328 coronavirus Sl-NTDs. PdCoV Sl-CTD has the same P-sandwich fold as a-coronavirus 
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329 Sl-CTDs, and this structural fold differs from the (i-shcct fold of (i-coronavirus Sl- 

330 CTDs. However, Sl-CTDs from all coronavirus genera share the same structural 

331 topology, suggesting a common evolutionary origin of coronavirus Sl-CTDs. PdCoV Sl- 

332 CTD binds to an unidentified receptor on mammalian cell surfaces, and may function as 

333 the main RBD. Moreover, PdCoV SI-CTD is in a closed conformation with its putative 

334 receptor-binding sites buried; it would need to switch to an open conformation for 

335 receptor binding. The structures of both Sl-NTD and SI-CTD of PdCoV are more similar 

336 to those of a-coronaviruses than to those of P-coronaviruses, and hence PdCoV spike is 

337 evolutionarily more closely related to a-coronavirus spikes than to P-coronavirus spikes. 

338 The trimeric PdCoV S2 forms the stalk of the spike protein. Each of the S2 subunits is 

339 locked in the pre-fusion conformation by structural constraint of S1 from a different 

340 monomeric subunit. More specifically, HR-N and FP are prevented from re-folding into 

341 their post-fusion conformation by the steric restrictions from SI-CTD and SD1, 

342 respectively, of another subunit. PdCoV spike possesses several structural features that 

343 appear to facilitate its evasion from host immune surveillance, such as its compact 

344 structure, the closed conformation of its SI-CTD, and heavy glycosylation near critical 

345 epitopes in S1. Overall, our study combines the structure and function of PdCoV spike, 

346 and provides many insights into the receptor recognition, membrane fusion, immune 

347 evasion, and evolution of PdCoV spike as well as coronavirus spikes in general. 

348 
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542 Materials and Methods 

543 Expression, purification, and treatment ofPdCoV spike ectodomain 

544 PdCoV spike ectodomain (S-e) (residues 18-1077) was cloned into pFastBac 

545 vector (Life Technologies Inc.) with a N-terminal honeybee melittin signal peptide and 

546 C-terminal GCN4 and His fi tags. It was expressed in sf9 insect cells using the Bac-to-Bac 

547 system (Life Technologies Inc.) and purified as previously described (15). Briefly, the 

548 protein was harvested from cell culture medium, and purified sequentially on Ni-NTA 

549 column and Superdex200 gel filtration column (GE Healthcare). Because we showed 

550 earlier that low pH could facilitate trimer formation (45), we incubated PdCoV S-e in 

551 buffer containing 0.1 M sodium citrate (pH 5.6) at room temperature for 1 hour, and then 

552 re-purified it on Superdex200 gel filtration column in buffer containing 20 mM Tris 

553 pH7.2 and 200 mM NaCl. 

554 Cryo-electron microscopy 

555 For sample preparation, aliquots of PdCoV S-e (3 pi, 0.35 mg/ml, in buffer 

556 containing 2 mM Tris pH7.2 and 20 mM NaCl) were applied to glow-discharged CF-2/1- 

557 4C C-flat grids (Protochips). The grids were then plunge-frozen in liquid ethane using a 

558 FEI Marklll Vitrobot system (FEI Company). 

559 For data collection, images were recorded using a Gatan K2 Summit direct 

560 electron detector in the direct electron counting mode (Gatan), attached to a Titan-Krios 

561 TEM (FEI Company), at Purdue University. The automated software Leginon (46) was 

562 used to collect -2,100 movies at 22,500x magnification and at a defocus range of 

563 between 0.5 and 3 pm. Each movie had a total accumulated exposure of 52 e/A 2 
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564 fractionated in 55 frames of 200 ms exposure. Data collection statistics are summarized 

565 in Table 1. 

566 For data processing, the recorded movies were corrected for beam-induced 

567 motion using MotionCor2 (47). The final image was bin-averaged to give the pixel size 

568 to be 1.3 A. The parameters of the microscope contrast transfer function were estimated 

569 for each micrograph using GCTF (48). Particles were automatically picked and extracted 

570 using RELION 2.0 on a GPU workstation with a box size of 256 pixels. Initially, 

571 particles were subjected to 2D alignment and clustering using RELION 2.0, and the best 

572 classes were selected for an additional 2D alignment. Some of the particles on 2D class 

573 averages appear to have a tail (Fig. S1A), which may correspond to HR-C. Nevertheless, 

574 the weak density of the tail region suggests that this region is poorly ordered, and hence 

575 this region was not included in subsequent map calculation and model building. All of the 

576 particles, with or without the tail, were subjected to 3D auto-refine with a mask covering 

577 the overall shape of the particles (excluding the tail region) to yield the map. The 

578 orientations of the particles used in the final reconstruction map sufficiently covered the 

579 whole sphere in the Fourier space to allow calculation of a 3D map with isotropic 

580 resolution. The map was sharpened with modulation transfer function of K2 operated at 

581 300kV using RELION 2.0 post processing. Reported resolution was based on the gold- 

582 standard Fourier shell correlation (FSC) = 0.143 criterion, and Fourier shell correction 

583 curves were corrected for the effects of soft masking by high-resolution noise substitution 

584 (49). Data processing statistics are summarized in Table 1. 

585 Model building and refinement 
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586 For atomic model building, the cryo-EM structure of HCoV-NL63 spike (PDB: 

587 5SZS) were divided into 7 parts (Sl-NTD, SD2’, SDl’, Sl-CTD, SD1”, SD2” and S2), 

588 and fitted into the cryo-EM map of PdCoV S-e individually using UCSF Chimera (50) 

589 and Coot (51). Model rebuilding was performed manually in Coot based on the well- 

590 defined continuous density of the main chain, and sequence register assignment was 

591 guided mainly by the density of N-linked glycans and of bulky amino acid residues. The 

592 structural model was refined using Phenix (52) with geometry restrains and three-fold 

593 noncrystallographic symmetry constraints. Refinement and manual model correction in 

594 Coot were carried out iteratively until there was no more improvement in geometry 

595 parameters. The quality of the final model was analyzed with MolProbity (53) and 

596 EMRinger (54). The validation statistics of the structural model are summarized in Table 

597 1. 

598 ELISA sugar-binding assay 

599 PdCoV Sl-NTD containing a C-terminal His f) tag was expressed and purified in 

600 the same way as PdCoV S-e, and assayed for its sugar-binding capability using an ELISA 

601 assay as previously described (18). Briefly, ELISA plates were pre-coated with bovine 

602 mucin (1 mg/ml) at 37 °C for 1 hour. After blocking with 1% BSA at 37 °C for 1 hour, 

603 PdCoV Sl-NTD (1 Lig/ml) was added to the plates and incubated with mucin at 37 °C for 

604 1 hour. After washes with PBS buffer, the plates were incubated with anti-Hise antibody 

605 (Santa Cruz) at 37 °C for 1 hour. Then the plates were washed with PBS and incubated 

606 with HRP-conjugated goat anti-mouse IgG antibody (1:5,000) at 37 °C for 1 hour. After 

607 more washes with PBS, enzymatic reaction was carried out using ELISA substrate (Life 

608 Technologies Inc.) and stopped with 1 M HiS04. Absorbance at 450 nm (A450) was 
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609 measured using Tecan Infinite M1000 PRO Microplate Reader (Tecan Group Ltd.). Five 

610 replicates were done for each sample. Porcine epidemic diarrhea virus (PEDV) SI and 

611 SARS-CoV Sl-CTD were prepared as previously described (15, 55), and PdCoV Sl- 

612 CTD was prepared as described below; these three proteins were used in the assay as 

613 controls. 

614 

615 Dot-blot receptor-binding assay 

616 PdCoV Sl-CTD containing a C-terminal Hisg tag was expressed and purified in 

617 the same way as PdCoV S-e, and assayed for its receptor-binding capability using a dot- 

618 blot receptor-binding assay as previously described (55). Briefly, 5 pM receptor (human 

619 ACE2 or porcine APN) was dotted onto nitrocellulose membranes. The membranes were 

620 dried and blocked with 1% BSA, and then incubated with 1 pM PdCoV Sl-CTD at 4 °C 

621 for 2 hours. After washes with PBS buffer, the membranes were incubated with anti-His6 

622 antibody (Life Technologies Inc.) at 4 °C for 2 hours, washed with PBS, incubated with 

623 HRP-conjugated goat anti-mouse IgG antibody (1:5,000) at 4 °C for 2 hours, and washed 

624 with PBS. Finally, the receptor-bound proteins were detected using a chemiluminescence 

625 reagent (ECL plus, GE Healthcare). Recombinant human ACE2 and porcine APN were 

626 prepared as previously described (13, 15). 

627 Flow cytometry cell-binding assay 

628 PdCoV Sl-CTD containing a C-terminal Fc tag was expressed, purified, and 

629 assayed for its cell-binding capability by flow cymotetry as previously described (56). 

630 Briefly, human (HeLa and A549) and pig (ST and PK15) cells were incubated with 

631 PdCoV Sl-CTD-Fc (40 pg/ml), or human IgG-Fc control, at room temperature for 30 

632 min, followed by incubation with fluorescein isothiocyanate (FITC)-labeled anti-human 
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633 IgG-Fc antibody for 30 min. The cells were then analyzed for the binding using flow 


634 cytometry. 

635 
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636 Table 1 Data and model statistics 

637 


Data collection 

Microscope 

Titan Krios 

Voltage (keV) 

300 

Defocus range (pm) 

1.0 to 4.0 

Movies 

2168 

Frames per movie 

55 

Dose rate (e'/A 2 /s) 

4.7 

Total dose per movie (e7A 2 ) 

51.7 

Data processing 

Particles 

87,002 

Symmetry 

C3 

Provided B-factor (A 2 ) 

-1JU 

3.3 

Map resolution (A) 


Model Validation 

UCSF Chimera CC(57) 

0.865 

EMRinger Score(54) 

2.77 

MolProbity Score(53) 

1.91 

All-atom clashscore(53) 

5.48 

Rotamers outliers (%) 

0.78 

Ramachandran allowed (%) 

99.59 

Ramachandran outliers (%) 

0.41 

R.m.s deviations 


Bond length (A) 

0.009 

Bond angles (°) 

1.437 


639 
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641 Figure Legends: 

642 Figure 1. Overall structure of PdCoV S-e in the pre-fusion conformation. (A) 

643 Schematic drawing of PdCoV S-e (spike ectodomain). SI: receptor-binding subunit. S2: 

644 membrane-fusion subunit. GCN4-His ft : GCN4 trimerization tag followed by Hisg tag. Sl- 

645 NTD: N-terminal domain of SI. Sl-CTD: C-terminal domain of SI. CH-N and CH-C: 

646 central helices N and C. FP: fusion peptide. HR-N and HR-C: heptad repeats N and C. 

647 Residues in shaded regions (N-terminus, GCN4 tag, and His6 tag) were not traced in the 

648 structure. (B) Cryo-EM maps of PdCoV S-e with atomic model fitted in. The maps have 

649 a contour of 6.6 a. (C) Cryo-EM structure of pre-fusion PdCoV S-e. Each of the 

650 monomeric subunits is colored differently. (D) Structure of a monomeric subunit in the 

651 pre-fusion conformation. The structural elements are colored in the same way as in panel 

652 (A). 

653 

654 Figure 2. Cryo-EM data analysis of PdCoV S-e. (A) Representative micrographs of 

655 frozen-hydrated PdCoV S-e particles and representative 2D class averages in different 

656 orientations. Arrow indicates a poorly ordered tail region in some of the particles. (B) 

657 Gold-standard Fourier shell correlation (FSC) curves. The resolution was determined to 

658 be 3.3 A. The 0.143 and 0.5 cut-off values are indicated by horizontal grey bars. (C) Final 

659 cryo-EM map of PdCoV S-e colored according to the local resolution. 

660 

661 Figure 3. Structure of PdCoV SI. (A) Schematic drawing of PdCoV SI. SD1: 

662 subdomain 1. SD2: subdomain 2. SD1 consists of two discontinuous regions SD1’ and 

663 SD1”. SD2 consists of two discontinuous regions SD2’ and SD”. (B) Structure of 
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664 monomeric SI. Domains and subdomains are colored in the same way as in panel (A). 

665 Residue ranges for each of the domains and subdomains are labeled. (C) Structure of 

666 trimeric SI, viewed from the side. Each of the monomeric subunits is colored differently. 

667 The empty space under SI is occupied by S2, which is not shown here. (D) Structure of 

668 trimeric S1, viewed from the top. Each of the monomeric subunits is colored differently. 

669 

670 Figure 4. Structural alignments of PdCoV spike with the spikes from other 

671 coronavirus genera. (A) Alignment of PdCoV and P-genus MHV spikes. PdCoV spike 

672 is colored in magenta. MHV spike (PDB ID: 3JCL) is colored in cyan. (B) Alignment of 

673 PdCoV and a-genus HCoV-NL63 spikes. PdCoV spike is colored in magenta. HCoV- 

674 NL63 spike (PDB ID: 5SZS) is colored in green. Each subunit of PdCoV SI contains 

675 only one Sl-NTD, whereas each subunit of HCoV-NL63 SI contains two. 

676 

677 Figure 5. Structure and function of PdCoV Sl-NTD. (A) Structure of PdCoV S1- 

678 NTD. The putative sugar-binding site is indicated by the question mark. (B) Structure of 

679 a-genus HCoV-NL63 Sl-NTD (PDB ID: 5SZS). (C) Structure of P-genus BCoV Sl- 

680 NTD (PDB ID: 4H14). (D) ELISA sugar-binding assay for PdCoV Sl-NTD. Here the 

681 ELISA plates were pre-coated with sugar-rich mucin, and then PdCoV Sl-NTD was 

682 added and incubated with mucin. Mucin-bound Sl-NTD was detected using antibodies 

683 recognizing its C-terminal Hisg tag. Porcine epidemic diarrhea virus (PEDV) SI was used 

684 as the positive control; PdCoV Sl-CTD, SARS-CoV Sl-CTD, and BSA were used as 

685 negative controls. Plate without mucin was used as an additional negative control. 
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686 Statistic analyses were performed using two-tailed t-test. Error bars indicate S.E.M. 

687 (n=5). *** P<0.001. 

688 

689 Figure 6. Structure and function of PdCoV Sl-CTD. (A) Structure of PdCoV S1- 

690 CTD. The putative RBM loops are indicated by the question mark. (B) Structure of a- 

691 genus HCoV-NL63 Sl-CTD (PDB ID: 3KBH). (C) Structure of ('.-genus SARS-CoV Sl- 

692 CTD (PDB ID: 2AJF). (D) Flow cytometry assay for the binding of PdCoV Sl-CTD to 

693 the surface of mammalian cells. Cell-bound PdCoV Sl-CTD was detected using 

694 antibodies recognizing its C-terminal Fc tag. Fc or cells only were used as negative 

695 controls. Statistic analyses were performed using two-tailed t-test. Error bars indicate 

696 S.E.M. (n=4). *** P< 0.001. (E) Dot-blot receptor-binding assay for PdCoV Sl-CTD. 

697 Here the receptor (either APN or ACE2) was first dotted onto a membrane. Subsequently, 

698 PdCoV Sl-CTD was dotted and incubated with the receptor. Receptor-bound Sl-CTD 

699 was detected using antibodies recognizing its C-terminal His6 tag. TGEV and SARS-CoV 

700 Sl-CTDs were used as positive controls. PBS buffer was used as a negative control. 

701 

702 Figure 7. Structure and function of PdCoV S2. (A) Structure of the pre-fusion 

703 monomeric PdCoV S2 only including CH-C, HR-N and FP. Arrow indicates the direction 

704 in which HR-N would need to extend to reach the post-fusion conformation. Question 

705 mark indicates part of CH-C that likely is part of HR-N. Residue ranges for each of the 

706 structural elements are labeled. (B) Sl-CTD and SD1 from a different subunit stack with 

707 HR-N and FP, respectively, preventing them from switching to their post-fusion 

708 conformation. Scissor indicates the proteolysis sites to the N-terminus of FP. (C) 
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709 Structures of influenza HA2 in the pre-fusion and post-fusion conformations (PDB IDs: 

710 2YPG and 1QU1). Arrow indicates the direction in which HR-N would need to extend to 

711 reach the post-fusion conformation. Scissor indicates the proteolysis sites to the N- 

712 terminus of FP. 

713 

714 Figure 8. Glycosylation sites on the surface of PdCoV spike. (A) Distribution of N- 

715 linked glycosylation sites on the one-dimensional structure of PdCoV spike. 'P indicates 

716 N-linked glycosylate site. Those on the top indicate glycans observed in the structure. 

717 Those at the bottom indicate predicted, but not observed, glycosylate sites. Predicted 

718 glycosylation sites in the N-terminal region and HR-C were not included because these 

719 two regions were not traced in the structure. (B) Distribution of N-linked glycosylation 

720 sites on the three-dimensional structure of PdCoV spike. Observed glycans are in dark 

721 blue. Predicted, but not observed, glycosylation sites are in light blue. (C) Distribution of 

722 N-linked glycosylation sites in monomeric SI. Question marks indicate the putative 

723 sugar-binding site in Sl-NTD and putative RBMs in Sl-CTD, respectively. 

724 

725 

726 

727 
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